A1 Part 2

Author

Ashmita Biswas, Ihina Purohit, Manal Nazer, Shashwat Gupta

A1 Part 2: Fertility

A data frame containing 254,654 (and 30,000, respectively) observations on 8 variables.

In our exploratory data analysis (EDA), we have examined the data set through several steps: inspecting the data, making assumptions about the variables, validating those assumptions, summarizing the information, applying filters, visualizing the patterns, and finally, drawing meaningful insights.

Data Dictionary

Qualitative Variables

children_gender: Factor indicating genders of the children. Genders of the children are clubbed based on the order of birth. They could range from either male-male, male-female, female-male or female-female.

havemorekids: Factor indicating whether the mother has more than 2 children.

ethnicity: Factor indicating the ethnicity of the mother. The ethnicity can be either Caucasian, Hispanic, AFAM (African American), other (ethinicities other than the ones mentioned), AFAM+Hispanic, AFAM+Other or Hispanic+Other.

Quantitative Variables

mother_age: age of the mother in years.

work_weeks: number of weeks the mother worked in the year 1979.

Our questions before performing EDA: 

  • Does having more kids correlate with working fewer weeks?

  • Does the gender of the 2 existing children play a role in having a third child?

  • Do ethnicities differ in average weeks worked?

  • At what ages is work participation highest/lowest?

  • Are non-working mothers clustered by age, ethnicity, or whether or not they have more than 2 children?

  • Do “mixed ethnicity” mothers show a different pattern?


Setting up libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mosaic)
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum
library(skimr)

Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing
library(tinytable)

Attaching package: 'tinytable'

The following object is masked from 'package:ggplot2':

    theme_void
library(visdat)
library(naniar)

Attaching package: 'naniar'

The following object is masked from 'package:skimr':

    n_complete
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(DT)
library(ggformula)
library(crosstable)

Attaching package: 'crosstable'

The following object is masked from 'package:purrr':

    compact
theme_set(theme_minimal(base_family = "sans"))

Reading the data..

Fertility <- readr::read_delim("https://vincentarelbundock.github.io/Rdatasets/csv/AER/Fertility.csv")
Rows: 254654 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): morekids, gender1, gender2, afam, hispanic, other
dbl (3): rownames, age, work

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s take a glimpse()

dplyr::glimpse(Fertility)
Rows: 254,654
Columns: 9
$ rownames <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ morekids <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ gender1  <chr> "male", "female", "male", "male", "female", "male", "female",…
$ gender2  <chr> "female", "male", "female", "female", "female", "female", "ma…
$ age      <dbl> 27, 30, 27, 35, 30, 26, 29, 33, 29, 27, 28, 28, 35, 34, 32, 2…
$ afam     <chr> "no", "no", "no", "yes", "no", "no", "no", "no", "no", "no", …
$ hispanic <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ other    <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ work     <dbl> 0, 30, 0, 0, 22, 40, 0, 52, 0, 0, 0, 52, 52, 52, 8, 7, 0, 40,…

Observations:

  • Multiple entries of the work column contain “0” values.
  • Keep in mind: The traditional American business hours are 9:00 a.m. to 5:00 p.m., Monday to Friday, representing a workweek of five eight-hour days comprising 40 hours in total. These are the origin of the phrase 9-to-5, used to describe a conventional and possibly tedious job.

Exploring the work week data:

Fertility %>% 
  dplyr::count(work>0)
# A tibble: 2 × 2
  `work > 0`      n
  <lgl>       <int>
1 FALSE      120141
2 TRUE       134513

We found that nearly half of the parents in the data set did not work at all (0 weeks) during the year, and among those who did work, many were employed for fewer weeks, suggesting part-time work.

One hypothesis could be that this data set has information of mothers who are traditional wives or those who are unemployed/ extender break from the labor force. We also are led to this hypothesis because this data set only contains information for one parent and fertility data traditionally feature mothers more often than fathers.

Munging the data..

We mutated the columns to make the data set more comprehensible. To improve readability, we consolidated the different ethnicities into a single column, ensuring that mixed ethnicities were also grouped together to give them a clearer representation.

Additionally, we renamed certain factors to make the data easier to interpret.

Fertility_2 <- Fertility %>%
  mutate(ethnicity = case_when(
    afam == "yes" & hispanic == "no" & other == "no" ~ "AFAM",
    afam == "no" & hispanic == "yes" & other == "no" ~ "Hispanic",
    afam == "no" & hispanic == "no" & other == "yes" ~ "Other",
    afam == "no" & hispanic == "no" & other == "no" ~ "Caucasian",
     afam == "yes" & hispanic == "yes" & other == "no" ~ "AFAM+Hispanic",
     afam == "yes" & hispanic == "no" & other == "yes" ~ "AFAM+Other",
     afam == "no" & hispanic == "yes" & other == "yes" ~ "Hispanic+Other"
  )) %>% 
mutate(children_gender = case_when(
  gender1== "male" & gender2== "male" ~ "male-male",
  gender1== "male" & gender2== "female"~ "male-female",
  gender1== "female" & gender2== "male" ~ "female-male",
  gender1== "female" & gender2== "female" ~ "female-female"
)) %>% 
 
 select(-c(gender1, gender2, afam, hispanic, other)) %>% 
  mutate( morekids = as.factor(morekids),
          ethnicity = as.factor(ethnicity),
          children_gender = as.factor(children_gender)) %>% 
  dplyr::relocate(ethnicity, children_gender, work, morekids, .after=age) %>% 
rename("mother_age" = age,
       "work_weeks" = work,
       "havemorekids"= morekids)


Fertility_2
# A tibble: 254,654 × 6
   rownames mother_age ethnicity children_gender work_weeks havemorekids
      <dbl>      <dbl> <fct>     <fct>                <dbl> <fct>       
 1        1         27 Caucasian male-female              0 no          
 2        2         30 Caucasian female-male             30 no          
 3        3         27 Caucasian male-female              0 no          
 4        4         35 AFAM      male-female              0 no          
 5        5         30 Caucasian female-female           22 no          
 6        6         26 Caucasian male-female             40 no          
 7        7         29 Caucasian female-male              0 no          
 8        8         33 Caucasian male-male               52 no          
 9        9         29 Caucasian female-male              0 no          
10       10         27 Caucasian male-female              0 no          
# ℹ 254,644 more rows

What does all this data mean??

Counting the ethnicities in this data set as well as the count of 2 or more children across ethnicities:

Fertility_2 %>% 
  dplyr::count(ethnicity) 
# A tibble: 6 × 2
  ethnicity           n
  <fct>           <int>
1 AFAM            12960
2 AFAM+Hispanic     196
3 Caucasian      216033
4 Hispanic        11117
5 Hispanic+Other   7584
6 Other            6764
Fertility_2 %>% 
   dplyr::count(ethnicity, havemorekids) 
# A tibble: 12 × 3
   ethnicity      havemorekids      n
   <fct>          <fct>         <int>
 1 AFAM           no             7027
 2 AFAM           yes            5933
 3 AFAM+Hispanic  no              104
 4 AFAM+Hispanic  yes              92
 5 Caucasian      no           137344
 6 Caucasian      yes           78689
 7 Hispanic       no             5562
 8 Hispanic       yes            5555
 9 Hispanic+Other no             3522
10 Hispanic+Other yes            4062
11 Other          no             4183
12 Other          yes            2581

The data set leans towards representing more Caucasian women compared to other ethnic group.

The findings of this exploratory data analysis are to be taken with this in mind.

Across ethnicities, most of the women don’t have more than 2 kids.

One hypothesis could be that few of these could also comprise of single parents households or co-parent households; maybe more nuclear families than joint families. Access to healthcare/ income/societal/personal factors could also play a role on them having 2 children and not more.

Visualizing the data..

Fertility_2 %>%
  gf_props(~ethnicity,
    fill = ~havemorekids,
    position = "fill"
  ) %>%
  gf_labs(
    title = "Distribution of Mothers Having 2 or More Children",
    subtitle = "Data across different ethnicities, by proportion"
  ) %>%
  gf_refine(scale_fill_brewer(palette = "Set1"))

“Caucasian” and “Other” ethnicities seem to comprise of the most number of mothers with only 2 children, compared to “Hispanic + Other”, which has most number of mothers with more than 2 children.

How does sex education/ access to healthcare play a role here, across ethnicities?

Does the gender of the 2 existing children play a role in having a third child?

Let’s find out:

Fertility_2 %>% 
  gf_props(~ethnicity,
    fill = ~children_gender,
    position = "fill"
  ) %>%
  gf_facet_grid(~havemorekids) %>% 
  gf_labs(
    title = "Distribution of Gender Combinations of Children Across Ethnicities",
    subtitle = "Data showing presence of 2 or more children, by proportion"
  ) %>%
  gf_refine(scale_fill_brewer(palette = "Set1"))

Let’s plot a better graph, one with position=dodge to understand the child preferences better.

Fertility_2 %>%
  gf_bar(~ethnicity,
    fill = ~children_gender,
    position = "dodge"
  ) %>%
  gf_facet_grid(~havemorekids) %>% 
  gf_labs(title = "Distribution of Gender Combinations of Children Across Ethnicities",
          subtitle="Data showing presence of 2 or more children") %>%
  gf_refine(scale_fill_brewer(palette = "Set1"))

Now let’s make a graph, faceting it by ethnicities, apart from caucasians, to understand the other ethnicities better:

Fertility_2 %>%
  filter(ethnicity!="Caucasian") %>% 
  gf_bar(~havemorekids,
    fill = ~children_gender,
    position = "dodge"
  ) %>%
  gf_facet_grid(~ethnicity) %>% 
  gf_labs(title = "Distribution of Gender Combinations of Children Across Ethnicities",
          subtitle="Data showing presence of 2 or more children") %>%
  gf_refine(scale_fill_brewer(palette = "Set1"))

Awesome!

The data suggests that the decision to have a third child is not simply based on whether the parents already have a daughter/son, but rather on whether their existing children are of the same gender(male-male/female-female). Parents with two children of the same gender are more likely to go on to have a third child, possibly driven by the hope of “balancing” genders within the family. And also families who already have both a male and a female child (a “balanced” gender ratio) are less likely to have more children.

This pattern negates our assumption of there being a subtle form of gender bias/sexism, where the female-female combination is more strongly associated with trying for another child, reflecting a cultural preference for sons.

Our assumption however, is still seen in the “Other” ethnicity, suggesting that our hypothesis is not merely an assumption.

Lets see the effect of the amount of weeks worked on the presence of there being 2 or more children:

Fertility_2 %>%
  gf_histogram(~work_weeks | havemorekids ) %>% 
  gf_labs(title= "Distribution of Work Weeks Among Mothers",
          subtitle= "Who has more children?")
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Mothers who are not employed (0 weeks of work in a year) make up the majority in both the histograms.

Both histograms also show a spike among mothers who work full-time. This could suggest that these mothers need to support their families financially, possibly as single parents - or that they are ambitious and committed to maintaining their careers while raising children.

One possible interpretation is that mothers with more children may need to work more in order to provide for them. However, the data also shows that many working mothers tend to stop having children at two children, as do mothers who are not working at all.

This leads us to questioning:

Why do some mothers choose not to have more than two children regardless of their work status?

Do mothers stop working after having more than two children because childcare demands leave them with little time for employment, or are they unable to continue working due to the increased motherly responsibilities?

Do factors like age, ethnicity also play a role in such decisions?

Let’s find out:

The summary below shows us the distribution of ages of mothers with 2 or more children.

Fertility_2 %>% 
  dplyr::summarise(mean_mother_age= mean(mother_age, na.rm=T),
    sd_mother_age = sd(mother_age, na.rm=T),
    min_mother_age = min(mother_age, na.rm=T),
    max_mother_age = max(mother_age, na.rm=T))
# A tibble: 1 × 4
  mean_mother_age sd_mother_age min_mother_age max_mother_age
            <dbl>         <dbl>          <dbl>          <dbl>
1            30.4          3.39             21             35

Now let’s look at this through ethnicities:

Fertility_2 %>% 

  group_by(ethnicity) %>% 
   dplyr::summarise(mean_mother_age= mean(mother_age, na.rm=T),
    sd_mother_age = sd(mother_age, na.rm=T),
    min_mother_age = min(mother_age, na.rm=T),
    max_mother_age = max(mother_age, na.rm=T))
# A tibble: 6 × 5
  ethnicity      mean_mother_age sd_mother_age min_mother_age max_mother_age
  <fct>                    <dbl>         <dbl>          <dbl>          <dbl>
1 AFAM                      29.9          3.52             21             35
2 AFAM+Hispanic             30.0          3.60             22             35
3 Caucasian                 30.5          3.34             21             35
4 Hispanic                  29.9          3.63             21             35
5 Hispanic+Other            29.2          3.71             21             35
6 Other                     30.7          3.26             21             35

Summarising work weeks:

Fertility_2 %>% 
  group_by(ethnicity) %>% 
 dplyr::summarise(
    mean_work_weeks = mean(work_weeks, na.rm=T),
    sd_work_weeks = sd(work_weeks, na.rm=T),
    min_work_weeks = min(work_weeks, na.rm=T),
    max_work_weeks = max(work_weeks, na.rm=T),
    
  )
# A tibble: 6 × 5
  ethnicity      mean_work_weeks sd_work_weeks min_work_weeks max_work_weeks
  <fct>                    <dbl>         <dbl>          <dbl>          <dbl>
1 AFAM                      29.2          22.4              0             52
2 AFAM+Hispanic             22.4          23.2              0             52
3 Caucasian                 18.4          21.7              0             52
4 Hispanic                  18.1          21.7              0             52
5 Hispanic+Other            18.3          21.5              0             52
6 Other                     21.4          22.8              0             52

African American (AFAM) women tend to work more on average (around 29 weeks a year) compared to other ethnicities. One of the factors contributing to this could be the greater economic or social pressures that their community has to face that make continuous participation in the labor market less of a choice and more of a necessity.

Hispanic and Caucasian women record the lowest mean number of work weeks (18.06 and 18.4, respectively).

Similarly, other hypotheses could be:

  • A possible explanation is that Hispanic women may benefit from joint-family like support systems, allowing them to focus less on paid work while still managing household responsibilities.

  • Caucasian women, on the other hand, may be less likely to head single-parent households, and also, the fact that Caucasian men tend to earn higher wages compared to other ethnic groups, there may be less financial necessity for them to engage in consistent employment.

    Do cultural norms and socioeconomic positioning shape women’s participation in the labor force?

Let us try plotting boxplots in order to see how that looks across ages:

Fertility_2 %>% 
  gf_boxplot(ethnicity~mother_age, orientation = "y", color =~ havemorekids) %>% 
  gf_labs(title="Ages of mothers who have 2 or more kids across ethnicities",
          subtitle="Distributed on the basis of having 2 or more children")

For most ethnicities, the age distribution for mothers with more kids skews slightly higher than those without.

One finding is that different ethnic groups have children at different average ages:

  • Mothers in the Hispanic+Other category have children at more diverse ages, some quite young, some much older.

  • In contrast, groups like Other have narrower boxes, meaning most mothers fall within a smaller, tighter age range.

  • Hispanic and Hispanic + Other mothers seem to have slightly younger distributions compared to Caucasian or AFAM mothers.

Crosstable Overview

crosstable(mother_age + work_weeks ~ havemorekids+children_gender + ethnicity,
  data = Fertility_2
) %>%
  crosstable::as_flextable()

ethnicity

AFAM

AFAM+Hispanic

Caucasian

Hispanic

Hispanic+Other

Other

children_gender

female-female

female-male

male-female

male-male

female-female

female-male

male-female

male-male

female-female

female-male

male-female

male-male

female-female

female-male

male-female

male-male

female-female

female-male

male-female

male-male

female-female

female-male

male-female

male-male

havemorekids

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

no

yes

mother_age

Min / Max

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

23.0 / 34.0

23.0 / 35.0

22.0 / 35.0

22.0 / 34.0

25.0 / 35.0

22.0 / 35.0

22.0 / 35.0

22.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

21.0 / 35.0

Med [IQR]

30.0 [27.0;32.0]

31.0 [28.0;33.0]

30.0 [27.0;33.0]

31.0 [28.0;33.0]

30.0 [27.0;33.0]

31.0 [28.0;33.0]

30.0 [27.0;32.0]

31.0 [28.0;33.0]

27.5 [25.0;29.8]

29.0 [28.0;32.0]

32.0 [27.0;34.0]

30.5 [27.0;33.0]

29.5 [27.2;32.0]

32.0 [29.0;34.0]

29.0 [27.0;33.0]

32.0 [31.0;34.0]

31.0 [28.0;33.0]

32.0 [29.0;33.0]

31.0 [28.0;33.0]

32.0 [29.0;33.0]

31.0 [28.0;33.0]

32.0 [29.0;34.0]

31.0 [28.0;33.0]

32.0 [29.0;33.0]

30.0 [26.0;32.0]

31.0 [28.0;33.0]

30.0 [27.0;32.0]

31.0 [28.0;33.0]

30.0 [27.0;32.0]

31.0 [28.0;33.0]

29.0 [27.0;32.0]

31.0 [28.0;33.0]

29.0 [25.0;32.0]

30.0 [27.0;33.0]

29.0 [25.0;31.2]

30.0 [27.0;33.0]

28.0 [26.0;32.0]

31.0 [28.0;33.0]

28.0 [25.0;31.0]

30.0 [28.0;33.0]

31.0 [28.0;33.0]

31.0 [29.0;33.0]

31.0 [29.0;33.0]

32.0 [29.0;33.0]

31.0 [29.0;33.0]

31.0 [28.0;34.0]

31.0 [29.0;33.0]

32.0 [28.0;34.0]

Mean (std)

29.6 (3.5)

30.2 (3.5)

29.8 (3.5)

30.2 (3.5)

29.7 (3.5)

30.2 (3.5)

29.5 (3.5)

30.5 (3.4)

27.8 (3.2)

29.6 (3.3)

30.7 (3.9)

29.9 (3.7)

29.4 (3.2)

30.8 (4.0)

29.7 (3.6)

31.5 (3.5)

30.1 (3.4)

30.9 (3.2)

30.2 (3.4)

31.0 (3.2)

30.3 (3.4)

31.0 (3.2)

30.2 (3.4)

30.9 (3.2)

29.4 (3.7)

30.3 (3.5)

29.4 (3.7)

30.5 (3.4)

29.3 (3.7)

30.2 (3.4)

29.2 (3.7)

30.5 (3.4)

28.5 (3.7)

29.8 (3.6)

28.5 (3.7)

29.7 (3.6)

28.5 (3.8)

30.1 (3.5)

28.4 (3.8)

29.9 (3.4)

30.5 (3.4)

30.7 (3.3)

30.9 (3.2)

31.0 (3.1)

30.6 (3.2)

30.8 (3.3)

30.8 (3.1)

30.7 (3.5)

N (NA)

1619 (0)

1522 (0)

1812 (0)

1411 (0)

1866 (0)

1401 (0)

1730 (0)

1599 (0)

22 (0)

25 (0)

23 (0)

16 (0)

34 (0)

20 (0)

25 (0)

31 (0)

30506 (0)

21132 (0)

35651 (0)

17564 (0)

35926 (0)

17539 (0)

35261 (0)

22454 (0)

1252 (0)

1466 (0)

1451 (0)

1257 (0)

1480 (0)

1326 (0)

1379 (0)

1506 (0)

812 (0)

1018 (0)

940 (0)

923 (0)

894 (0)

986 (0)

876 (0)

1135 (0)

846 (0)

726 (0)

1110 (0)

566 (0)

1104 (0)

609 (0)

1123 (0)

680 (0)

work_weeks

Min / Max

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

0 / 52.0

Med [IQR]

44.0 [7.5;52.0]

25.0 [0;51.0]

40.0 [4.8;52.0]

26.0 [0;52.0]

44.0 [10.0;52.0]

24.0 [0;52.0]

41.0 [4.0;52.0]

30.0 [0;52.0]

4.5 [0;47.5]

0 [0;48.0]

12.0 [0;50.0]

0 [0;36.8]

33.0 [0;52.0]

0 [0;26.5]

8.0 [0;50.0]

38.0 [7.0;52.0]

10.0 [0;48.0]

0 [0;34.0]

10.0 [0;48.0]

0 [0;32.0]

10.0 [0;48.0]

0 [0;34.0]

10.0 [0;48.0]

0 [0;30.0]

8.0 [0;48.0]

0 [0;34.0]

8.0 [0;50.0]

0 [0;34.0]

8.0 [0;48.0]

0 [0;36.0]

12.0 [0;49.0]

0 [0;32.0]

12.0 [0;50.0]

0 [0;32.8]

16.0 [0;50.0]

0 [0;28.0]

18.5 [0;50.0]

0 [0;30.0]

16.0 [0;50.0]

0 [0;32.0]

16.0 [0;51.8]

0 [0;42.8]

19.5 [0;52.0]

1.0 [0;44.0]

16.0 [0;52.0]

0 [0;47.0]

20.0 [0;52.0]

0 [0;40.0]

Mean (std)

32.8 (21.6)

25.1 (22.7)

31.9 (21.8)

25.3 (22.6)

32.9 (21.4)

24.8 (22.6)

31.8 (21.8)

26.3 (22.5)

19.3 (23.5)

16.8 (23.4)

21.3 (23.3)

16.2 (21.0)

27.8 (23.9)

16.0 (20.8)

21.7 (23.9)

31.8 (21.8)

20.4 (22.1)

15.2 (20.5)

20.3 (22.1)

14.9 (20.4)

20.5 (22.1)

15.1 (20.5)

20.4 (22.1)

14.6 (20.3)

20.3 (22.2)

15.1 (20.6)

21.2 (22.8)

15.3 (20.6)

20.3 (22.2)

16.0 (20.8)

21.1 (22.4)

15.1 (20.8)

21.7 (22.5)

15.6 (20.3)

22.5 (22.5)

14.3 (19.5)

23.0 (22.5)

14.5 (19.9)

22.4 (22.5)

14.9 (20.4)

23.2 (23.3)

17.5 (21.6)

23.6 (23.0)

18.4 (21.9)

22.9 (22.9)

18.6 (22.3)

24.4 (23.3)

16.9 (21.7)

N (NA)

1619 (0)

1522 (0)

1812 (0)

1411 (0)

1866 (0)

1401 (0)

1730 (0)

1599 (0)

22 (0)

25 (0)

23 (0)

16 (0)

34 (0)

20 (0)

25 (0)

31 (0)

30506 (0)

21132 (0)

35651 (0)

17564 (0)

35926 (0)

17539 (0)

35261 (0)

22454 (0)

1252 (0)

1466 (0)

1451 (0)

1257 (0)

1480 (0)

1326 (0)

1379 (0)

1506 (0)

812 (0)

1018 (0)

940 (0)

923 (0)

894 (0)

986 (0)

876 (0)

1135 (0)

846 (0)

726 (0)

1110 (0)

566 (0)

1104 (0)

609 (0)

1123 (0)

680 (0)


Data Table:

Fertility_2 %>% 
  datatable(Fertility_2)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

Conclusion

Our exploratory analysis shows that fertility and mothers’ participation in the workforce are shaped by many overlapping factors such as age, ethnicity, family structure, and cultural norms.

Most mothers in this data set stop at two children, but those who go on to have more tend to be slightly older and often show differences in their work participation.

Ethnicity also matters: African American mothers work more weeks on average, while Hispanic and Caucasian mothers record fewer weeks, suggesting different economic pressures and support systems.

The gender of existing children also plays a role, with families who already have two children of the same sex more likely to have a third, reflecting a desire for balance rather than a clear gender bias.

Are cultural expectations or financial needs the main reason why mothers stop at two children or choose to have more?

Till what extent are their work choices voluntary, and till what extent are they shaped by inequalities in childcare, wages, and access to healthcare?

Most importantly: What do these findings tell us about the everyday trade-offs mothers make between personal goals, cultural traditions, and the social systems around them?